Unsupervised, corpus-based method for extending a biomedical terminology
نویسندگان
چکیده
Objectives: To automatically extend downwards an existing biomedical terminology using a corpus and both lexical and terminological knowledge. Methods: Adjectival modifiers are removed from terms extracted from the corpus (three million noun phrases extracted from MEDLINE), and demodified terms are searched for in the terminology (UMLS Metathesaurus, restricted to disorders and procedures). A phrase from MEDLINE becomes a candidate term in the Metathesaurus if the following two requirements are met: 1) a demodified term created from this phrase is found in the terminology and 2) the modifiers removed to create the demodified term also modify existing terms from the terminology, for a given semantic category. A manual review of a sample of candidate terms was performed. Results: Out of the 3 million simple phrases randomly extracted from MEDLINE, 125,000 new terms were identified for inclusion in the UMLS. 83% of the 1000 terms reviewed manually were associated with a relevant UMLS concept. Discussion: The limitations of this approach are discussed, as well as adaptation and generalization issues.
منابع مشابه
LoLo: A System Based On Terminology For Multilingual Extraction
An unsupervised learning method, based on corpus linguistics and special language terminology, is described that can extract time-varying information from text streams. The method is shown to be ‘language-independent’ in that its use leads to sets of regular-expressions that can be used to extract the information in typologically distinct languages like English and Arabic. The method uses the i...
متن کاملUsing Domain-Specific Verbs for Term Classification
In this paper we present an approach to term classification based on verb complementation patterns. The complementation patterns have been automatically learnt by combining information found in a corpus and an ontology, both belonging to the biomedical domain. The learning process is unsupervised and has been implemented as an iterative reasoning procedure based on a partial order relation indu...
متن کاملUnsupervised Event Extraction from Biomedical Text Based on Event and Pattern Information
In this paper, we proposed a new event extraction method from biomedical texts. It can extend patterns by unsupervised way based on event and pattern information. Evaluation of our system on GENIA corpus achieves 90.1% precision and 70.0% recall.
متن کاملUnsupervised Classification of Biomedical Abstracts Using Lexical Association
The task of text classification is the assignment of labels that describe texts’ characteristics, such as topic, genre or sentiment. Supervised machine learning techniques such as Support Vector Machines or the simple but effective Naı̈ve Bayes have been successfully applied to this task. However, it is not always practical to acquire a sufficient corpus of labelled examples to train these metho...
متن کاملJournal of Biomedical Informatics
With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002